DOC UN E NT B6S0BB 



BO 099 U30 



95 



TH 004 308 



AUTHOR 
TITLE 

INSTITUTION 

SPOBS AGENCY 

REPORT NO 
P0B DATE 
CONTRACT 
NOTE 

BOSS PBICE 
DESCRIPTORS 



Eash, naarice J* ; And Others 

Evaluation Designs for Practitioners. TH Report No. 
35. 

ERIC clearinghouse on Tests, oeasuresent, and 
Evaluation, Princeton, N.J. 

National Inst, of Education (DBS 9) , Washington, 
D.C. 

ETS-TH-35 
Dec 7<* 

OEC-0-70-3797-5T9 
6p. 

MF-S0.75 HC-S1.50 PLCS POSTAGE 

* Dec is ion Baking; Design; ^Education; Educational 
Research; Evaluation; Guides; *Prograa Evaluation; 
♦Research Design 



ABSTRACT 

Practitioners are not afforded the luxury of ideal 
laboratory conditions. The natural settings of the classroom, the 
school, or the school systes place constraints on the type of data 
obtainable; hence, educators oust work with less than an ideal 
experimental design. Four evaluation designs used in natural settings 
are described. Each involves an evaluation study that takes into 
account a variety of constraints, but nevertheless provides a basis 
for subsequent prograa and/or organizational decision. The study 
includes a true experiaental design in a field setting, a 
noneguivaient control groups design, a tine series design, and a no 
coeparison groups design. (Author/RC) 



EQ I f* ERIC CLEARINGHOUSE ON TESTS, MEASUREMENT, & EVALUATION 
n I «i EDUCATIONAL TESTING SERVICE, PRINCETON, NEW JERSEY 08540 



TM REPORT 35 



DECEMBER 1974 



U 1 BCPAftTME*? Of HEALTH 
€ DUCATlO* A fftLFAftE 

na Tio^Ak f»*sr<fijtfi o* 

tDVCATtQ* 

*« s :»•: vf%* Hi set n wf 
r f r i «a • * kr -f -,f c * * ;v 
t<< * .* *i ♦ .,• >S -to s 

* >-n-N*' , * , f A "fa " f % OS . 

•i'l .* :*o v.'* st i »-..*•*, * *t pfat 

bS* • • a. Sm* .Am, s ■••»* . ■»» 



EVALUATION DESIGNS FOR PRACTITIONERS 

MaarieeJ. Eash 
Harriet Talmage 
Herbert J. Walberg 



© 



o 
© 



Planning and implementing any facet of the educational 
program call for decision making whether the project 
concerns the program of an entire school system or the 
day-to-day practice of a teacher in a single classroom. 
The interactive nature of the educational process pro- 
duces a dynamic environment; hence, decisions made at 
one point in time require reassessment at the next point 
in time be lore another round of decisions can begin. 
Evaluation provide* a framework for building a sys- 
tematic data base to aid in making decisions in school 
and classroom practice. With an appropriate data base, 
problems can be reformulated, both potential and actual 
consequences can be analyzed, and. as a result, the 
processes can be redirected. 

Practitioners are not afforded the luxury of idea! 
laboratory conditions. The natural settings of the class- 
room, the school, or the school system place constraints 
upon the type of data obtainable; hence, educators must 
work ttith less than an ideal experimental design.* 

Four evaluation designs used in natural settings are 
described in the following sections. 2 Each involves an 
evaluation study that takes into account a variety of con* 
straints. but nevertheless provides a basis for subsequent 
program and or organizational decisions. The studies 
range from a true experimental design, one that necessi- 
tates the random assignment of students to experimental 
and control groups, to a design that lacks both ran- 
domization and comparative groups. 

In each section, the basic paradigm of the evaluation 
design is symbolically presented. Four symbols identify 
the elements of the paradigms: R — randomization; X— 
treatment; O— observation; and in some cases, DA — 

* the sfudtvs used to illustrate the designs were conducted by the Office 
<ii fc*aiuan<»n Research. < , oltege of Education. University oi Illinois at 
C hicago Circle. 

-For additional designs, the reader may *ish to consult Donald T. 
Campbell and Julian <\ Stanley, Hxpvrimrnial and Quash 
t.xprnmvntal th'uym ft*r Krsranh. Chicago: Rand McNally 
Company. !%.!. 



design analysis* Subscripts denote specific treatments 
and observations. Observations (O) to the left of the 
treatment (X) denote pretest data, and to the right, post- 
test data. The experimental group symbols appear above 
the control group symbols, A broken line between the 
groups indicates nonequivalent groups. 



R X o 

R O 



A Tree Experimental Design 
In a Field Setting 

A true experimental design is characterized by its 
randomization of subjects to treatment — "randomly 
dividing the litter among treatments —and is the conven- 
tional laboratory-science way of exercising this control. 
The strength of the design, randomization for control of 
error, is also a major source of difficulty in Held evalua- 
tions because studies are conducted where scheduling, 
teacher preferences in assignment, luncheon arrange- 
ments, and a myriad of other considerations enter in. 
Thus one finds the experimental design infrequently 
used in reported evaluations. However, because of its 
power to bring forth more valid findings, we suggest that 
v aluators search for ways to employ it in field situations. 
An example drawn from an evaluation of a curriculum 
model set up under a Title HI grant illustrates the power 
of a true experimental design to bare true differences and 
the weaknesses of nonrandom comparison groups. 
Clock town, a fast growing suburb in a major metro- 
politan area, received a three-year grant to design a 
middle school curriculum which would break sharply 
with the conventional curriculums in the seven other 
junior high schools. The new curriculum included: 1) 
greater parent involvement, 2) a more humanistic 
orientation. 3) promoting greater achievement. 4) pro- 
moting more affective growth. 5) integrating pupil per* 
sonne! services within the curriculum, and 6) offering 
these changes at a per*pupU cost competitive with the 
costs in the other junior high schools. After one year of 
planning, the two-year experimental school opened. 



Tfm publication was prepared pursuant to a contract with the National Institute of Education. US- Department of Health, Education 
and Welfare. Contractors undertaking such projects under government sponsorship are encouraged to express freely their judgment in 
professional and technical matters. Points of view or opinions do not. therefore, represent official National Institute of Education 
g position or policy. 



Through a combination of events and advanced plan* 
ning, a true experimental design became possible, A pool 
of GOO potential students for the Model School was 
developed through volunteers and recruitment. The 
Model School was established to enroll 300 students, and 
all applicants wen? informed that a random selection 
would govern admission to the school. The outside evalu- 
ator* randomly selected the 300 students, thereby 
creating an experimental group (those in the Model 
School) ami control groups (those who were in the 
original poof of applicants but were not admitted to the 
school). 

A number of measurements were taken to evaluate the 
goals of the Model School. Whenever possible, the 
results were analyzed within the experimental designs of 
Experimental Group vs. Control Group. One example of 
the strength of the experimental design over a quasi- 
experimental comparison group design is shown in 
Figure 1 . when. 1 achievement test scores for the Model 
School, the Control Schools students, and the district 
average for ai! junior high school eighth graders are 
graphed. This graph shows dramatic differences in cur- 
riculum treatment between the experimental ami the 
control groups in selected areas of mathematics and 
reading achievement. If the district averages had been 
substituted for the control group results, much of the 
effect of the curriculum change would have been ob- 
scured, for clearly the achievement of the pool of 
students is not representative of the district's average 
achievement. 



95 
94 
93 

92 
9 I 
SO 
89 
* AS 

3 86 

1 1 85 
- B4 
83 
62 
Si 
80 
79 
*8 
r? 

?6 

75 L 



■ I 

-4- 



iHSTHlCt AVf NAL.t <m 



1 

*■ - 



-4- 



- .4 




w 



A. 
\ 



-r — 



X 



— -h 



-/ — 



XT 



2j 



S 4 



II 



A second example of how true differences are masked 
is seen when a volunteer group instead of a randomly 
selected control group is used in a comparison of class- 
room observations made in volunteer teachers' 
classrooms, in year two, the control group of classrooms 
to be observed was randomly selected to obtain a more 
representative sample of classroom practice to compare 
with the Model School. The differences are much sharper 
since the first year volunteer control group classrooms 
were much closer to the experimental group in practice 
than were the typical district junior high school class- 
rooms. (See Figure 2.) The experimental design is in- 
valuable to control error and to trace the attribution of 
results to treatment more clearly. Every effort should be 
made to use it when the question of curriculum effects is 
at issue or a summative evaluation is at stake. 



fijuw? MwBltnjl tut Rit«tami*Si<»rUrt t«»iu!»,iioCl«m«nwi 
CoodcU ClMUoomi. <nd VotytitMt CUtv«ym 




O X o 

Nonequlvalent Control-Group Design \"qx"o 



It is usually difficult to assign students randomly to 
classrooms receiving special treatment or to assign 
teachers randomly within schools to special programs. In 
the first instance, parents tend to resist changes that vary 
from the established curriculum without their approval. 
In the second instance, teachers assigned to new pro- 
grams involuntarily may affect the outcomes negatively. 
Through a nonequlvalent control-group design, the 



9 

ERIC 



handicap due to the lock of randomisation is compen- 
sated for in several way*. 

The Textville School District study concerned the 
problems of evaluating four new reading series to select 
on* for system-wide adoption, Instructional materials 
piay a significant role in the educational process for 75 
percent of the instructional time in the classroom, and 90 
percent of the homework time is devoted to these 
materials. Thus, adoption cannot he taken lightly* 
Selecting a reading series frequently entails ideological 
confrontation to the neglect of facts. Publishers display 
their materials with attractive illustrations and dick 
copy* and groups of teachers espouse one approach to 
reading instruction or another as the final solution to all 
reading problems- Therefore, an evaluation design was 
developed to serve two purposes: ! > to overcome the diffi- 
culties of nonrandoniization ami 2) to establish a data 
base for making selection decisions on the basis of facts 
rather than ideological quibbling. 

In designing the evaluation study, the drawbacks of 
nonraitdom assignment of students and teachers to 
experimental and control groups were taken into con* 
siderotic*) by obtaining pretest and posttest data* em- 
ploying multiple treatments for comparisons with the 
traditional treatment and comparisons among the treat* 
ments, and using the class rather than the individual 
student as the unit of study. An adaptation of the non- 
equivalent control-group design is illustrated in Figure 3. 
Pretest (O pn rT> and posttest (O pmtr) reading achievement 
data were obtained. Data on teacher characteristics (Of) 
were initially collected. Subsequent to the introduction of 
the treatment (X). data were obtained on learning en- 
vironment variables (0$ competitiveness, cohesiveness. 
difficulty, friction, and satisfaction) and on instructional 
characteristics (Oy. locus of instructional decisions, 
variety and utilization of materials, and student 
behaviors). 

The Tcxtvillc schools and teachers were encouraged to 
participate in the study. Sixty classes from 12 schools 
were chosen and represented the range of ability, of 
socioeconomic, racial, and ethnic backgrounds, and of 
geographic locations found in the district. Assignment to 



Figure 3. Nonequivalent Control-Group Design Paradigm 



°preT 


°1 


*1 


°postT 


0 2 


o 3 


°preT 


01 


*2 


°postT 


0 2 


03 


°preT 


Ol 


*3 


°postT 


<h. 


0 3 


°preT 


Ol 


X 4 


0 postT 




0 3 


°preT 


Ol 


^control 


°postT 


0 2 


03 



Figure 4. Assignment to Treatment Matrix 



GRADE 
LEVEL 


semes 

*1 


sentts 


SIRES SERIES 
*3 ** 


SERIES 


TOTAL 


1ST 


3 


3 


3 


3 


3 


IS 


2ND 


3 


3 


3 


3 


3 


IS 


3R0 


3 


3 


3 


3 


3 


IS 


6TH 


3 


3 


3 


3 


3 


16 


TOTAL 
CLASSES 


12 


ta 


12 


12 


12 


60 



a reading series by grade level is shown in Figure 4. For 
each reading series* the materials were field-tested in 
three different schools in grades I, 2, 3, and 6. In ail, the 
data included ! 2 different classes per series. 

Two constraints were imposed on the design: I) At! 
four classes in a school field-testing the reading materials 
must use the same series; and 2) the best educational 
interest of the students must supersede the design of the 
study. And, indeed, this came to pass: One class found 
too many difficulties with the scries at the peril of imped* 
ing their reading progress, and the class was removed 
from the study. 

The data were analyzed to pfovide information on four 
questions: 

• Do the classes using one series obtain higher read- 
ing scores on the reading achievement posttest 
than classes using another series? 

• Do the classes using one reading series percetv? 
their learning environment differently than do 
classes using another reading series? Do the 
learning environment and reading series taken to* 
get her affect achievement? 

• Do selected teacher characteristics in conjunction 
with a given series affect reading achievement? 

• Does instruction differ in classes using different 
reading series? 

Statistical analyses indicated that the pretest score is 
the single most significant predictor of reading achieve* 
ment despite teacher characteristics and regardless of 
the reading series. After the effects of the pretest scores 
ah? removed, competitiveness is the only other variable 
that predicts reading achievement. The higher the com* 
petitiveness in the learning environment, the lower the 
reading achievement. There are no significant correla- 
tions between competitiveness and reading series, 
teacher characteristics, or instructional characteristics. 

The final selection decision for the Textville School 
District shifted away from an emphasis on ideological 
issues such as phonics-oriented vs. nonphonics-oriented 
reading approaches or linguistic vs. noniinguistic 



reading approach. In place of these, attentiim was 
focused on the instructional aspect* of a reading 
program that tend to reduce competitiveness, and on 
such concern* as the districts phtktsophy of reading, cost 
factors, implementation problems, ami the degree of 
teacher dependence on outside support. 



O X O 



Time* Series Design 

Practitioners are frequently faced with the necessity of 
making major program changes which reorganize cur* 
ttculum ami structural arrangements. Not infrequently, 
such changes are precipitated by externa! forces that are 
impatient with the setting up of an evaluation design that 
would require the establishing of control groups before 
the change is made. In these cases, data are frequently 
desperately needed by administrative decision makers if 
they arc not to be at the mercy of rumor and pressure 
groups. Such was the case of the Parkland School 
District, which was suddenly under a legal mandate to 
integrate its schools. Dv facto segregation resulting from 
segregated housing placed practically all the black popu- 
lation in one elementary school and the white population 
in si\ schools, and produced segregation up through 
grade ft. The junior high schools were integrated in 
name, but not always in reality, for the students segre* 
gated themselves by race in the lunchroom and on the 
playgriujnd. Faced with a legal mandate to bus students 
to achieve equal racial proportions in all seven ele- 
mentary schools, Parkland administrators requested an 
* nit side evaluator to help them set up an evaluation 
design that would provide basic data on these questions: 
I) What effects does the structural reorganization 
requ red by busing have on student achievement and on 
the learning environment? 2) What data would be useful 
tor program planning and for alerting the administratkm 
to potential difficulties? 

The evaluation was hampered by th*» inability to set up 
control groups through randomization. Moreover, since 
the entire school system was involved, no separate 
control groups were available. Within these limitations, 
it was decided to use a time-series design tin a two-year 
period that would allow within-the-group comparisons, 
use a multiple collection of data, and give a reading on 
several indices of prioress. Experience indicates that 
over the two years many productive hypotheses were 
generated and an invaluable data base for charting prog- 
ress in achievement, race relation*, and classroom in- 
struction was established. 

A pretest and post test on general achievement w as 
given every child in the fall and spring. Since there were 
previous local norms available, these data quieted fears 
that integration was destroying achievement. A learning 
environment measure, administered in the spring, 
revealed that further curriculum planning was needed to 



improve the learning environment for both white and 
black students in different school*. An analysts of the 
learning environment and achievement measures 
revealed that some schools appeared to be much more 
successful than others in providing a stimulating learn* 
ing environment and promoting achievement. While 
the lack of adequate controls limited generalization* or 
conclusion*, these data did pinpoint areas for closer 
investigation by administrators and teachers. One of the 
mttre immediately useful applications of evaluation data 
came when rumors of the deterioration of discipline in 
one school swept the community. The recently ad- 
ministered teaming environment inventory profile 
calmed both the school board and the public by its 
demonstration that the students in this school perceived 
their environment very much as did their counterparts in 
other schools, and that there was no greatev conflict or 
disruption in their school than in the others. 

A third area of data was an analysis of the records of 
disciplinary cases in the junior high schools. These again 
provided some short-term data as the basts of decisions. 
*ince the offenses that took up most of administrators* 
and guu' w counselors' time were being committed by 
a v ery snv. #ruup of students. (See Figure 5.) 



T4 
70 



60 



90 



40 



90 



20 



MAMfV • ft 




0 f 2 S 4 6 6 7 6 9 10 If 12 IS fc» 15 16 17 « 19 20 2t 22 2% 24 25 26 



Interracial problems were mK as prevalent a* intraraeial 
problems. A second year of charting thcM behavioral 
incidents shtwed that the concentration of social services 
on the tew major offenders had removed them from the 
behavioral rectus in the second year, in addition, it was 
found that interracial conflict had deceased. Thus, one 
is ted to conclude that the time-series design provides a 
useful data base for decision making in a situation where 
tensions induced by structural changes cry tor the voice 
of rationality. One must admit that these data have 
limited generali/ability, but they have been invaluable in 
the context in which they are collected and in demon- 
strating that evaluation can serve several purposes in 
applied settings. 



No Comparison 
Group Design 



IH-Mtfn Analysis 



x-o 
x~o 



Devilfn Analysts 



Not infrequently, an evaluator is confronted with a 
program that is to be used but is being undertaken with 
restraints that forestall the use of control groups. Is use- 
fulness of evaluation forestalled under these circum- 
stances, and must one retreat to the rhetoric of casti- 
gating shortsightedness in the developer? The fourth 
example deals with such a problem. 

An outside private agency provided funds to increase 
and improve the teaching of the arts in schools. Launch- 
ed from very broad objectives, "to enable parents and 
community leaders to use the arts as communication 
tools " the agency requested evaluation assistance to 
improve the series of workshops that it had designed for 
teachers. 

From the workshops* guides that were presented and 
from the funding proposal, an analysts of workshop acti- 
vities to achieve the goals was prepared. The activities 
prmed to be a better source of goals than the diffuse 
general objectives. The evaluation design was concerned 
with; I) Were the activities being taught in the work- 
shops? 2) Were they perceived as useful by teachers since 
they incorporated creative and nonconvent tonal teaching 
approaches? ,%) Were they being implemented in class- 
rooms and did they maintain the integrity of the ac- 
tivities? 

The cvaluators were not permitted to gather data from 
control classes in the schools* nor were they to observe 
the instructors assigned to the workshops. The design of 
the evaluation structured the gathering of data by 
analyzing the program and developing an activity 
analysis, which was then converted into an instrument to 
be used by teachers to evaluate workshop activities on 
four dimensions: t) the workshop participants' reaction. 
2) whether teachers used any one activity in the class- 
room* 3) the students 1 reactions, and 4) ease of imple- 
mentation. A second scarce of data was gathered from a 
pretest and a posttest of learning environments in the 



workshop participants' classrooms. A third source of 
data was observation in classrooms where teachers 
taught the workshop activities to their students. A fourth 
source of data was a standardized teacher evaluation 
questionnaire to evaluate the workshops. 

From these data an analysts was made of the work- 
shops, ami recommendations were rendered on which 
workshops ami what activities were most useful in the 
classroom. As this evaluation progressed, feedback 
sessions were held with workshop directors to assist them 
in c<mduvtingihcLii^xt semester's workshops. Evaluation 
in this ease focused on providing dan'" *o a group of 
program developers who were workt. ubigmms 
area. Although many of the traditional * an 

evaluation design are lacking, these are t. .a be 

generated ami comparisons that can be made to s* ape 
the educational product. In the sense of serving to im- 
prove practice through the establishment of a data base 
and promoting meaningful comparisons for practi- 
tioner*, the evaluation design remains true to its calling 
in bringing rationality to play on educational activities. 

Cooperative Phtsttring hi Evaluation 

Evaluation is often viewed by practitioners as being 
outside their reach: The designs are incomprehensible, 
the data are too costly to gather, the participants are 
threatened by the potential of the findings, and the 
effects, efforts, and efficiency cannot be evaluated with 
any degree of objectivity anyway. Our experience, 
gathered over a wide variety of projects, would indicate 
that practitioners are handicapped by too narrow* a view 
of evaluation and by their failure to systematically build 
an evaluation design into projects. Moreover, trouble- 
some problems are not approached through an evalua- 
tion design w hich in its use converts rhetoric to a factual 
base, as was illustrated in the example on the reading 
series. In short, decision making and choice taking ate 
blind through the lack of evaluation designs which open 
up options and permit an earlier use of correctives in 
program planning. 

To provide for an evaluation design in the early stages 
makes for a mora open commitment to the major goals of 
a project, and establishes a degree of latitude for shifting 
direction based on evidence which often is denied when 
the program participants 4 personal commitments to a 
project deepen with effort. Evaluation can serve to keep 
the focus on the quest for a better u ay to provide educa- 
tion as opposed to espousing a dogma of "the way to pro- 
vide quality education If evaluation is seen as a 
necessary part of projects and problem solving, the use of 
cvaluators and evaluation findings becomes as significant 
as the appropriate use of evaluation designs. Findings 
must be implemented to be effective in decision making. 

At the Office of Evaluation Research, we have found 
that cementing an early working relationship between 



the evatuators and the practitioners h the best guarantee 
of the use of evaluation finding!. As outside evatuators, 
this entails building an evaluation design early in the 
project with inputs front practitioners on their needs for 
data. In another context, wc have referred to this process 
as a coMtiomil relationships where the two parties are 
engaged in a mutual task with a commitment to the dis- 
covery of options and the search for truth. Extra effort is 
required from the evatuators to explain designs and their 
strengths and constraints: but these early sessions also 
build the foundation of commitment to follow the find- 
ings wherever they may lead. The process is coactional m 
that the cvaluators perceive the context in which the 
evaluation design is being used and it is early on plans 
for implementation of findings at appropriate junctures. 
Our contention is that many evaluation reports are 



superfluous because they are ill-timed to the schedule of 
information needs of practitioners* or return findings 
that are arcane and remote from the decisions that are 
pressing the decision maker. We see. as imperative to 
success, the need to be sensitive to the rotes of the evalu* 
atom and practitioners and their relationships m 
building evaluation designs. The four evaluation designs 
described illustrate applications of a methodology in a 
Held context. Brevity did not permit the description of 
roles and relationships, though they are implicit in the 
applications. Appropriate use of evaluation designs, we 
contend, can bring rationality into play in field-based 
problems and can improve educational practice. 

"'Maurice J. |Unh. •• Transact ktttaJ Evaltiathin i»f Classroom Practice," 
in Studies in TratiMivthmal t'taitmiwn. ed. Robert M. Rippcy. 
Ikrktptet : McCttchan Pub. Corp.. 1913, 



